Space-Efficient Indexing of XML Documents for Content-Only Retrieval
نویسنده
چکیده
More and more documents are stored in semistructured formats like XML. In contrast to traditional information retrieval, the documents can become quite large, and it is often desirable to retrieve not complete documents, but isolated elements that satisfy an information need. To make this possible, the index structures from traditional information retrieval must be adapted to semistructured documents (specifically XML) so that term occurrences can be pinpointed to specific elements inside the documents. This paper explores several enhancements of the index structures and evaluates the advantages and drawbacks of the different versions with respect to index size and retrieval time.
منابع مشابه
Content oriented retrieval on document centric XML
XML is the perfect format for storing (mostly) textual documents in a digital library; its flexibility enables users to store both highly structured data (like database records) and free text in the same document. The data-centric parts can be searched using query languages like XPath and XQuery, where exact conditions on the structure can be imposed. For digital libraries, however, it is impor...
متن کاملA methodology for indexing and retrieval of information from XML document
The XML documents having markup elements are increasing vividly on the World Wide Web. Now the exigency is that how these documents could be used for the welfare of our posterity so that indexing and retrieving of these documents can be made more accurate and precise. The endeavors to make the standards for indexing and retrieving of XML documents are burgeoning. Currently the structured docume...
متن کاملContent-Aware DataGuides: Interleaving IR and DB Indexing Techniques for Efficient Retrieval of Textual XML Data
Not only since the advent of XML, many applications call for efficient structured document retrieval, challenging both Information Retrieval (IR) and database (DB) research. Most approaches combining indexing techniques from both fields still separate path and content matching, merging the hits in an expensive join. This paper shows that retrieval is significantly accelerated by processing text...
متن کاملSearching XML Documents - Preliminary Work
Structured document retrieval aims at exploiting the structure together with the content of documents to improve retrieval results. Several aspects of traditional information retrieval applied on flat documents have to be reconsidered. These include in particular, document representation, storage, indexing, retrieval, and ranking. This paper outlines the architecture of our system and the adapt...
متن کاملIndexation des documents XML : Un DataGuide annoté avec un index de contenu
Indexing in classical information retrieval brings few tools for the treatment of the semi-structured documents: the representations of documents in information retrieval were conceived for flat and homogeneous documents. They are not adapted to the simultaneous treatment of the structure and the contents. Several approaches of indexing semi-structured data was proposed to resolve this new chal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Datenbank-Spektrum
دوره 7 شماره
صفحات -
تاریخ انتشار 2007